Student name: Astrid Luyao He
Student number: 12968706
Course: Digital Analytics
Lecturers: Dhr. dr. T.B. (Theo) Araujo & Dr J. (Joanna) Strycharz
Date: 21/Mar/2021
TikTok is in trend these years. This mobile application’s name is highly aware by young and middle-aged people and is described as one of the “hottest” apps on Earth (The New York Times, 2019; Forbes, 2020). In January 2021, TikTok, together with its Chinese counterpart Douyin, have 1.289 trillion active users in total (Tankovska, 2021). This number beats the users of Instagram and WeChat and is approaching that of Facebook, allowing TikTok to become the most popular social media platform developed by Chinese companies.
Such high popularity of the platform brings enormous income to its owner — ByteDance. The company earned CNY$180 billion (€23.2 billion) advertising revenue in 2020, and 60% of which was generated by Douyin (Reuters, 2020). The reason why Douyin has a high amount of advertising-income is its high user numbers — advertisements on this platform have higher possibilities to be exposed to the “right” users. Hence, the advertisers are more willing to invest money in the platform. Apart from this aspect, the effectiveness of the ads is also crucial. Assume that ads on Douyin have, on average, lower returns (e.g., click-through rate), comparing with other social media platforms that have similar user numbers, it would be possible for the advertisers to turn to other social media instead. More directly, some advertisements on Douyin sign CPC (cost per click) contract, which means that a higher click-through rate would bring higher income to the platform.
Due to these reasons, ByteDance and Douyin should and must invest more in discovering the earning power of different advertisement features. One such feature is the number of in-feed advertisements that users can see continuously in the For you section in a row. Currently, Douyin pushes up to six ads continuously to its users. The company wants to discover whether push fewer ads to the users would help the advertisements reach better effectiveness.
Research Question: To what extent does the number of advertisements on Douyin seen by the users continuously influence the effectiveness of the ads?
In the For you section of Douyin, users can see the in-feed content pushed by the platform according to their preferences — containing both advertisements and nonprofit content. The advertisements are indicated by a little grey half-transparent mark on the bottom of the page, which can be hard to notice by the users.
When seeing the first advertisement pushed by the platform, users may keep similar levels of recognition comparing with when they watch the nonprofit contents. As the number of ads increases, users, on the one hand, have larger possibilities to discover that the content is an advertisement. On the other hand, if the users have already realized that they are watching ads, they may become increasingly impatient and focusing on jumping over the ads. According to the Elaboration Likelihood Model (Petty & Cacioppo, 1986), when audiences have a lower level of recognition, they tend to process information with the peripheral route. In such a situation, the information in the ads cannot be processed by the audiences properly, and the ads may have lower effectiveness. Thus, the main hypothesis of the test is:
H1: The increased number of advertisements seen by users in a row will negatively influence the effectiveness of the ads.
In this A/B test, the independent variable is the number of advertisements seen by Douyin users in a row. The dependent variable is the effectiveness of the advertisements. The IV of the test is straightforward, but the DV of the test needs more elaboration.
There are many ways to measure the effectiveness of the advertisements, including cognitive outcomes (e.g., testing the memory of the audiences, testing audiences’ attitudes and options, measuring users intention to purchase and seek information), and affirmative outcomes (e.g., measuring sales and inquires, measuring click through effectiveness), as well as conducting content analysis (Amos, Holmes & Strutton, 2008; Bleier & Eisenbeiss, 2015; Lucas & Britt, 1963; Nasiri, Sammaknejad & Sabetghadam, 2020; Segijn & Eisend, 2019). For this A/B test, users’ cognitive and psychological outcomes would be hard to collect. Some of the affirmative outcomes, including sales, would also be difficult to acquire, as sales data can only be accessed from the advertisers’ side. Therefore, in this test, the advertisements’ effectiveness is measured from three sides: the click frequency, the skip frequency, and the watch number of the ads.
H1a: The increased number of advertisements seen by users in a row will negatively influence the click rate of the ads.
H1b: The increased number of advertisements seen by users in a row will positively influence the skip frequency of the ads.
H1c: The increased number of advertisements seen by users in a row will negatively influence the watch time of the ads.
Apart from the main effect, many moderators are marked by literature, including the category of the ads, the present form of the ads, product category, brand familiarity, device category and demographics (e.g., age, gender), the presence of human faces, the presence of celebrities, and framing of the ads (Amos, Holmes & Strutton, 2008; Celuch, Singley & Slama, 2014; Maček, Bobek, Kubli & Burböck, 2019; Martínez & Mascarua, 2016; Phillips & Stanton, 2004; Segijn & Eisend, 2019; Stewart et al., 2019). Among these moderators, some should be controlled before the start of the test (e.g., the presence of human faces), and some can be ignored by using random assignment (e.g., product category). The detailed information about these two categories of moderators can be seen in the section Data Gathering. In this A/B test, there are two moderators that are left to be the control variables: gender and age. The following hypotheses are generated according to the previous research results.
H2: Compared with advertisements watched by males, those watched by females will reach higher effectiveness.
H2a: Compared with advertisements watched by males, those watched by females will have a higher click rate.
H2b: Compared with advertisements watched by males, those watched by females will have lower skip frequency.
H2c: Compared with advertisements watched by males, those watched by females will be played longer.
H3: The increased age of users will negatively influence the effectiveness of the ads.
H3a: The increased age of users will negatively influence the click rate of the ads.
H3b: The increased age of users will positively influence the skip frequency of the ads.
H3c: The increased age of users will negatively influence the watch time of the ads.
To find out the answer to the research question and test the hypotheses, Douyin conducted an A/B test recently. In this report, a simulated dataset was used since the actual data is difficult to get.
Instead of using the A/B test tools developed by other companies (e.g., Google Optimize, Adobe Target), the test is conducted using the company’s internal test tool because some of the data can only be accessed by the company and should be kept safely. There were 30,000 observations recorded in the simulated dataset.
In fact, in reality, more users should be involved. For example, one of the primary conversion of the test is to let the users click the advertisements. The average CTR (click-through rate) data for Douyin is hard to find, but in this report, the number can be assumed by its counterpart: the average CTR for TikTok is 1.5-3% (The Infinite Agency, 2021), so a number of 2% is taken as the baseline conversion rate. The company wishes to detect at least 5% of the change, which means that the MDE (minimum detectable effect) is 5%. After putting the two numbers in the A/B Test Sample Size Calculator, the test needs to involve at least 400,000 users.
In order to answer the research question, the test gathered the following data:
The data above can be used to test all the hypotheses.
Some of the variables do not appear in the dataset because they were controlled before the test:
Other variables can also influence the test, but their influences are reduced by randomly assigning ads to the users:
Though A/B test is efficient, and its result is clear and straightforward, it is sometimes problematic from the privacy perspective. The most dominant problem here is that regulations never force Douyin to request consent from the users participating in the test. In this case, collecting watch time and click frequency during the test is ethical, but the user ids and their IP addresses make each individual identifiable. After the test, the data can be used to manipulate and exploit the users. Moreover, if the data leaks, it may further harm users.
To conduct the test and use the data ethically, in reality, a more explicit consent form should be delivered to the users when conducting the test. The users under 18 years old should not be allowed to participate in the test. The identifiable personal information, including user id and IP address, should be deleted when analyzing the data and should not be used as covariates. The data of the test should be deleted entirely after a certain period.
The good news is that, since this is an A/B test, users are not picked according to the results. The test result may not cause potential biases in the future.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
from sklearn.linear_model import LogisticRegression, LinearRegression
import statsmodels.api as sm
import lime
from lime import lime_tabular
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix
from sklearn.metrics import classification_report
users = pd.read_csv('[data]Astrid_Luyao_He_12968706.csv')
# check the dataset
users.head()
# inspect gender column value
users['gender'].value_counts()
There are different values in this column. The column is recoded, transforming f and female to value 1, and m and male to value 2.
# create a clearer variable for gender
def generate_gender(row):
if row['gender'] == 'f':
row['sex'] = 1
if row['gender'] == 'female':
row['sex'] = 1
if row['gender'] == 'm':
row['sex'] = 0
if row['gender'] == 'male':
row['sex'] = 0
return row
users = users.apply(generate_gender, axis=1)
A new column for test conditions is also created. The values in this column show whether the user watched six ads continuously (labeled as 6) or three ads (labeled as 3).
# create a variable for test condition
users['ad_4'] = users['ad_4'].fillna(0)
def generate_condition(row):
if row['ad_4'] == 0:
row['condition'] = 3
else:
row['condition'] = 6
return row
users = users.apply(generate_condition, axis=1)
# inspect all the columns to check if there is any incorrect information
users.describe().transpose()
Now the missing values of the six click-frequency-related columns are handled. According to the frequency table, about 3000 - 5000 rows are marked with clicked (1.0), and other rows are missing values. In order to further process the variables, here, the missing values are replaced with 0, because the missing values indicate that the users did not click the ad.
# fill missing values in click frequency column with 0
users[['click_1', 'click_2', 'click_3', 'click_4', 'click_5', 'click_6']] = users[['click_1', 'click_2', 'click_3', 'click_4', 'click_5', 'click_6']].fillna(0)
For the watch-time-related columns, the minimum values are also 1.0, which indicates that the minimum watch time recorded was 1 second. Here, the missing values are also replaced by 0 because the missing values show users who watched the ads for less than 1 second.
# fill missing values in watch time column with 0
users[['watch_1', 'watch_2', 'watch_3', 'watch_4', 'watch_5', 'watch_6']] = users[['watch_1', 'watch_2', 'watch_3', 'watch_4', 'watch_5', 'watch_6']].fillna(0)
The sum of the click-frequency-related and the watch-time-related columns are calculated to test the hypotheses.
# generate two columns for total click frequency and watch time
users['click_freq'] = users['click_1'] + users['click_2'] + users['click_3'] + users['click_4'] + users['click_5'] + users['click_6']
users['watch_time'] = users['watch_1'] + users['watch_2'] + users['watch_3'] + users['watch_4'] + users['watch_5'] + users['watch_6']
Because in the following sections, the difference between the two conditions will be compared, and the number of ads watched by the users may influence their click frequency and total watch time, these two columns' mean values are calculated based on the number of ads the users watched in the test.
# generate two columns for average click frequency and watch time
users['mean_click'] = users['click_freq'] / users['condition']
users['mean_watch'] = users['watch_time'] / users['condition']
To visualize and analyze the two columns (mean_click and mean_watch) clearer, the columns are rounded.
# round the new columns
users[['mean_click', 'mean_watch']] = users[['mean_click', 'mean_watch']].round(decimals=2)
25088 out of 30000 users in the dataset have data in the sex column, which means that nearly 5000 users did not indicate their gender when registered. Since the test analysis includes sex as a covariate, here, all rows that do not have sex data are deleted to use logistic regression in the next section.
# delete rows without sex data
users = users.dropna(axis=0, subset=['sex'])
To better measure the watch time of the users, another dichotomous variable was created by using watch_1 to watch_6. This variable indicates whether the users watched the ads for less than 3 seconds, as the general content of an ad would be hard to grab when watching less than three seconds.
# create a variable for skip frequency
def generate_skip1(row):
if row['watch_1'] < 3:
row['skip_1'] = 1
else:
row['skip_1'] = 0
return row
users = users.apply(generate_skip1, axis=1)
def generate_skip2(row):
if row['watch_2'] < 3:
row['skip_2'] = 1
else:
row['skip_2'] = 0
return row
users = users.apply(generate_skip2, axis=1)
def generate_skip3(row):
if row['watch_3'] < 3:
row['skip_3'] = 1
else:
row['skip_3'] = 0
return row
users = users.apply(generate_skip3, axis=1)
def generate_skip4(row):
if (row['condition'] == 6) & (row['watch_4'] < 3):
row['skip_4'] = 1
else:
row['skip_4'] = 0
return row
users = users.apply(generate_skip4, axis=1)
def generate_skip5(row):
if (row['condition'] == 6) & (row['watch_5'] < 3):
row['skip_5'] = 1
else:
row['skip_5'] = 0
return row
users = users.apply(generate_skip5, axis=1)
def generate_skip6(row):
if (row['condition'] == 6) & (row['watch_6'] < 3):
row['skip_6'] = 1
else:
row['skip_6'] = 0
return row
users = users.apply(generate_skip6, axis=1)
# generate two columns for total and average skip frequency, and round the item
users['skip_freq'] = users['skip_1'] + users['skip_2'] + users['skip_3'] + users['skip_4'] + users['skip_5'] + users['skip_6']
users['mean_skip'] = users['skip_freq'] / users['condition']
users['mean_skip'] = users['mean_skip'].round(decimals=2)
Because in the next section, the regressions will be run to test the hypotheses, a dummy variable for condition six ads is created.
# create a dummy variable for test condition: three ads
def generate_condum(row):
if row['ad_4'] == 0:
row['six_ads'] = 0
else:
row['six_ads'] = 1
return row
users = users.apply(generate_condum, axis=1)
Also, create dummies for age and sex.
The rules of creating the dummy for age here is that the medium value of the age column (24) is taken, and it is used to divide the column into two groups: with 1 in younger_24 represents that the user is younger than 24 years old, and 0 shows that the user is older than 24 years old.
# create a dummy variable for gender
def generate_sexdum(row):
if row['sex'] == 1:
row['is_female'] = 1
else:
row['is_female'] = 0
return row
users = users.apply(generate_sexdum, axis=1)
# create a dummy variable for age according to its medium
def generate_agedum(row):
if row['age'] <= 24:
row['younger_24'] = 1
else:
row['younger_24'] = 0
return row
users = users.apply(generate_agedum, axis=1)
In the following sections, the confusion matrices may be used to evaluate the models. Therefore, dummies for the DVs are also created so they can be used in logistic regression. Here the three DVs are divided according to the following rules:
# create dummies for click freq
def generate_clickdum(row):
if row['mean_click'] == 0:
row['click_zero'] = 1
else:
row['click_zero'] = 0
return row
users = users.apply(generate_clickdum, axis=1)
# create dummies for skip freq
def generate_skipkdum(row):
if row['mean_skip'] == 0:
row['skip_zero'] = 1
else:
row['skip_zero'] = 0
return row
users = users.apply(generate_skipkdum, axis=1)
# create dummies watch time
def generate_watchdum(row):
if row['mean_watch'] > 9:
row['watch_nine'] = 1
else:
row['watch_nine'] = 0
return row
users = users.apply(generate_watchdum, axis=1)
After creating the three dummies, the visualization is checked to see whether they are balanced.
sns.countplot(x = users['click_zero'])
sns.countplot(x = users['skip_zero'])
sns.countplot(x = users['watch_nine'])
The visualizations show that the dummies for click frequency and watch time are balanced, but that for the skip frequency is unbalanced. Since mean_skip only has four values, no other rules can be used to create the dummy. The unbalanced data may cause inaccuracy when using skip_zero to evaluate the model.
For protecting users’ privacy, the identifiable information: user_id, and other unrelated variables are deleted.
users.columns
# delete unrelated columns
users = users[['age', 'younger_24', 'sex', 'is_female', 'condition', 'six_ads', 'mean_click', 'mean_watch', 'mean_skip', 'click_freq', 'watch_time', 'skip_freq', 'click_zero', 'skip_zero', 'watch_nine']]
users.head()
The data is cleaned.
The variables included in the test are:
IV: The test condition (condition) — users watch three or six ads in a row
DVs:
click_freq) — It shows how many times the users clicked the advertisements during the test. For each advertisement, if the user clicked the link, the corresponding variable's value turns to 1.0.watch_time) — It indicates the time for the users to watch the ads. When an advertisement started, the time started to count, and the counting stopped when the user turned to the next video. The time was counted in seconds.skip_freq) — It represents how many times a user watched an ad for less than three seconds. This variable is generated from watch_time. If the watch time of an ad was counted as less than 3 in the test, the corresponding variable's value turns to 1.0. Covariates:
# age
users['age'].describe()
sns.set_theme(style="whitegrid")
sns.boxplot(x = users['age'])
The average age of the users in this dataset is 24.5 years old. The youngest participant is 13 years old, and the oldest is 37 years old. The majority of the participants’ age between 22 and 27.
# gender
users['sex'].value_counts()
sns.countplot(x = users['sex'])
There are 15,085 female users and 10,003 male users who participated in the test. The gender distribution, though not equal, is not imbalanced.
The IV of the test indicates whether the participants watched three or six advertisements in a row.
# condition
users['condition'].value_counts()
sns.countplot(x = users['condition'])
A similar number of participants engaged in the two test conditions: 12,702 users watched three ads, and 12,386 users watched six ads during the test. The two conditions of this A/B test are equally distributed.
# click frequency - total
users['click_freq'].value_counts()
sns.countplot(x = users['click_freq'])
In this A/B test, over half of (55.7%) the participants clicked the ads when watching the continuous advertisements, indicating a very high CTR (click-through rate) of the platform.
The majority (80%) of users either clicked no ads or clicked the ads once.
# click frequency - average
users['mean_click'].value_counts()
sns.countplot(x = users['mean_click'])
The average value shows that nearly half (47.7%) of the participants clicked 0.17 or 0.33 times into the link of the ads in the test.
# watch time - total
users['watch_time'].describe()
sns.distplot(x = users['watch_time'])
On average, the participants spent about 40 seconds watching all ads in the test. Some users did not care about the ads and spent 5 seconds jumping all advertisements, and some users nearly watched all ads from the beginning to the end. The histogram of the watch time shows that the values gather around two areas — the majority of the users spent about 20-35 seconds or 45-63 seconds on watching advertisements. The distribution indicates that the watch time was influenced by the number of ads the users watched. Therefore, it is necessary to calculate the mean watch time of the users.
# watch time - average
users['mean_watch'].describe()
sns.distplot(x = users['mean_watch'])
In the test, each user spent about 9 seconds on each advertisement. Some users jumped to the next video without grabbing the advertisement's meaning (they spent less than 2 seconds on it), while some watched the whole video. The data is also powerful to attract the advertisers because it shows that the users watch each advertisement for a relatively long time.
# skip frequency - total
users['skip_freq'].value_counts()
sns.countplot(x = users['skip_freq'])
The majority of users (93.5%) in the test did not skip any ad.
# skip frequency - average
users['mean_skip'].value_counts()
sns.countplot(x = users['mean_skip'])
The mean skip possibility also follows the pattern of the overall skip frequency — the majority of users did not jump over any ad during the test.
# condition + click frequency
users.groupby('condition').agg({'mean_click':'mean'})
cond_click = sns.barplot(x = 'condition', y = 'mean_click', data = users)
cond_click.set(ylabel = 'Mean Click Frequency')
The Douyin users that watched six ads in a row in the test had, on average, higher click frequency than users who watched three ads. The confidence intervals in the plot indicate that although the average click frequencies for the two groups are not high (far less than 0.5), the difference between them is statistically significant.
# condition + watch time
users.groupby('condition')['mean_watch'].agg([min, max, np.mean])
cond_watch = sns.barplot(x = 'condition', y = 'mean_watch', data = users)
cond_watch.set(ylabel = 'Mean Watch Time')
The users participated in the test that in the six ads condition also showed longer ads watch time comparing with users in the three ads condition: they watched each advertisement for nearly one more second. The confidence intervals here show that the watch time difference between the two conditions is also statistically significant. The result indicates that the ads’ effectiveness was influenced by the number of ads watched by the users in the test.
# condition + skip frequency
users.groupby('condition').agg({'mean_skip':'mean'})
cond_skip = sns.barplot(x = 'condition', y = 'mean_skip', data = users)
cond_skip.set(ylabel = 'Mean Skip Frequency')
Compared with users who watched six ads continuously, those who watched three ads had a way higher possibility of jumping over the ads — the possibilities for users in different conditions to skip ads are 3.34% and 0.98%, respectively. The difference is statistically significant according to the confidence intervals.
# age + click frequency
age_click = sns.lineplot(x = 'age', y = 'mean_click', data = users)
age_click.set(ylabel = 'Mean Click Frequency')
Users of different ages have a similar frequency to click the link in an ad, while the frequency is slightly higher for users less than 15 years old and users at about 34 years old. The confidence invertal shows that the differences for the mean click frequency are slight and not statistically significant.
# age + watch time
age_watch = sns.lineplot(x = 'age', y = 'mean_watch', data = users)
age_watch.set(ylabel = 'Mean Watch time')
The watch times for users between 15 and about 34 years old are similar: the average watch time of the users is about 9 seconds. The plot shows that the users younger than 15 years old have higher watch time, but the difference is negligible. The users aged 35 also spent longer for each ad, and this difference is statistically significant.
# age + skip frequency
age_skip = sns.lineplot(x = 'age', y = 'mean_skip', data = users)
age_skip.set(ylabel = 'Mean Skip Frequency')
The visualization shows that the differences of skip frequency for different age groups are not significant.
# gender + click frequency
users.groupby('sex').agg({'mean_click':'mean'})
gender_click = sns.barplot(x = 'sex', y = 'mean_click', data = users)
gender_click.set(xlabel = 'Gender', ylabel = 'Mean Click Frequency')
The difference in click frequency caused by different gender is slight. Female and male users had similar click frequencies during the test.
# gender + watch time
users.groupby('sex')['mean_watch'].agg([min, max, np.mean])
gender_watch = sns.barplot(x = 'sex', y = 'mean_watch', data = users)
gender_watch.set(xlabel = 'Gender', ylabel='Mean Watch Time')
The difference in watch time is also tiny for two gender groups. The result shows that gender has a negligible effect on influencing the effectiveness of the advertisements on Douyin.
# gender + skip frequency
users.groupby('sex').agg({'mean_skip':'mean'})
gender_skip = sns.barplot(x = 'sex', y = 'mean_skip', data = users)
gender_skip.set(xlabel = 'Gender', ylabel = 'Mean Skip Frequency')
The visualization shows no evidence that the possibility for users to skip an ad was influenced by their genders.
Overall, the dataset is balanced for users' gender, age, and test conditions. It can be challenging for the test result to cause biases to different users because it would be used to distribute ads, not used to select Douyin users.
The dataset is split into the training data that takes 80% of the dataset and the testing data for testing the hypotheses.
train, test = train_test_split(users, test_size=0.2, random_state=42)
Although the DVs in the tests are all mean values, two of which are counted from categorical variables (click frequency and skip frequency), and the mean values should also be treated as categorical variables. However, because the DVs are calculated, have a standardized order scale (especially, the mean click frequency has seven categories), and they are interpreted as possibilities of doing an action, they are perceived as numerical variables, and linear regression models are used here to test the hypotheses for different DVs.
# H1a: mean click frequency
ols_click = sm.OLS(train['mean_click'], sm.add_constant(train[['six_ads', 'is_female', 'younger_24']]))
result_click = ols_click.fit()
print(result_click.summary())
The regression model predicts the mean click frequency from test conditions, age, and gender. Here, the value of R-squared is 0.018. The variance in test condition, gender, and age explain 1.8% of the variance in users’ click frequency. The R-squared value is not high.
six_ads) is positive, indicating a positive and significant (p < .001) effect: with the users watched more advertisements in the test, they are more likely to click into the link of the ads while controlling for users’ gender and age.is_female), too, has a positive effect on click frequency (albeit a very weak one, p > .05). The b coefficient for the partial effect of gender is 0.003. The difference of gender is predicted to increase by only 0.002 possibilities of clicking while holding age and test condition constant.younger_24) positively predicts click frequency, but the effect is too small to detect. The b coefficient for this effect is 0.001, and it is not statistically significant (p > .05).
# H1b: mean skip frequency
ols_skip = sm.OLS(train['mean_skip'], sm.add_constant(train[['six_ads', 'is_female', 'younger_24']]))
result_skip = ols_skip.fit()
print(result_skip.summary())
The regression model predicts the mean skip frequency from test condition, age, and gender. The variance in age, gender, and test condition explain no less than 1.6% of the variance in users’ possibility to skip the ads (R-square = .022).
six_ads) negatively predicts skip frequency, b = -0.024. All else equal, the increase of ads watched by the users predicts a decrease in users’ mean skip frequency for 0.024. This effect is statistically significant, t = -21.42, p < .001.is_feamle) has no effect on possibility to skip ads, b = -0.001, p > .05. The change in gender decreases a meager 0.001 possibility for the skip frequency, with the other predictors in the model held constant.younger_24) is another insignificant predictor of the possibility to jump over the ads, b = -0.000. This effect, which is not statistically significant (t = -0.19, p > .05), is very weak.
# H1c: mean watch time
ols_watch = sm.OLS(train['mean_watch'], sm.add_constant(train[['six_ads', 'is_female', 'younger_24']]))
result_watch = ols_watch.fit()
print(result_watch.summary())
The regression model predicts the mean watch time from test condition, age, and gender. The R-squared value of 0.06 shows that the regression model predicts 6% of the differences in users’ skip frequency.
six_ads) is negative, with a b coefficient of 0.856. Holding age and gender constant, the increase of the number of ads that appear in the test predicts an increase in average ads watching time of 0.9 seconds (t = 36.40, p < .001).is_female) does not significantly predict the watch time for ads, b = 0.034, p > .05. The mean ad watching time among females is on average only 0.03 seconds higher than among males, with the other predictors controlled.younger_24) is positively predicted by users’ age, but in a very slightly way (b = 0.011). This effect is not statistically significant (p > .05).Main effect
The three regression models imply that the test result (i.e., the number of advertisements watched by the users during the test) significantly influenced the effectiveness of the ads. If users watched more ads in the test, they had higher tendencies of clicking the links in the ads, were less likely to jump the ads, and were more likely to watch an advertisement longer. The data shows that the increased number of advertisements watched by users in a row positively influenced the effectiveness of the ads in the test. H1 is rejected by the dataset.
Moderation effect
In all three models, the p values for gender and age are larger than .05, which means that gender and age did not affect the test significantly. The results show that age and gender are not the models’ covariates, indicating that the test data do not confirm H2 and H3.
# predict mean click frequency
ols_click = LinearRegression(fit_intercept = True)
ols_click.fit(train[['six_ads', 'is_female', 'younger_24']], train['mean_click'])
lime_click = train[['six_ads', 'is_female', 'younger_24', 'mean_click']]
class_names_click = lime_click.columns
X_lime_click = lime_click[['six_ads', 'is_female', 'younger_24']].to_numpy()
y_lime_click = lime_click['mean_click'].to_numpy()
explainer_ols = lime.lime_tabular.LimeTabularExplainer(
X_lime_click,
feature_names = class_names_click,
class_names = ['click'],
verbose = True,
mode = 'regression',
discretize_continuous=True)
# 3 ads + male + older 24
ols_click.predict([[0,0,0]])
click_3ads1 = explainer_ols.explain_instance(np.array([0,0,0]), ols_click.predict)
click_3ads1.show_in_notebook(show_table=True)
Male users older than 24 years old in the three ads condition have about 14.58% possibility to click the ads links, and 85.42% possibility not to click anything. The possibility for them to click the ads will be negatively influenced by the test condition, while their gender and age will hardly cause any difference in the result. Among all the situations, this group of users will have the tiniest possibility to click an ad.
# 3 ads + female + older 24
ols_click.predict([[0,1,0]])
click_3ads2 = explainer_ols.explain_instance(np.array([0,1,0]), ols_click.predict)
click_3ads2.show_in_notebook(show_table=True)
Female users older than 24 years old in the three ads condition have about 14.92% possibility to click the ads links, and 85.08% possibility not to click anything. The possibility for them to click the ads will be negatively influenced by the test condition they have. The possibility will be positively influenced by their gender (as females) and be negatively influenced by their age very slightly.
# 3 ads + male + younger 24
ols_click.predict([[0,0,1]])
click_3ads3 = explainer_ols.explain_instance(np.array([0,0,1]), ols_click.predict)
click_3ads3.show_in_notebook(show_table=True)
Male users younger than 24 years old in the three ads condition have about 14.71% possibility to click the ads links, and 85.29% possibility not to click anything. The possibility for them to click the ads will be negatively influenced by the test condition they have. The possibility will be negatively influenced by their gender (as males) and be positively influenced by their age (< 24 years old) very slightly.
# 3 ads + female + younger 24
ols_click.predict([[0,1,1]])
click_3ads4 = explainer_ols.explain_instance(np.array([0,1,1]), ols_click.predict)
click_3ads4.show_in_notebook(show_table=True)
Female users younger than 24 years old in the three ads condition have about 15.05% possibility to click the ads links, and 84.95% possibility not to click anything. The possibility for them to click the ads will be negatively influenced by the test condition they have. Their age and gender will positively affect the possibility, but the influence is not significant.
# 6 ads + male + older 24
ols_click.predict([[1,0,0]])
click_6ads1 = explainer_ols.explain_instance(np.array([1,0,0]), ols_click.predict)
click_6ads1.show_in_notebook(show_table=True)
Male users older than 24 years old in the six ads condition have about 19.53% possibility to click the ads links, and 80.47% possibility not to click anything. The possibility for them to click the ads will be positively influenced by the test condition they have. At the same time, their gender and age will hardly cause any negative difference in the result.
# 6 ads + female + older 24
ols_click.predict([[1,1,0]])
click_6ads2 = explainer_ols.explain_instance(np.array([1,1,0]), ols_click.predict)
click_6ads2.show_in_notebook(show_table=True)
Female users older than 24 years old in the six ads condition have about 19.87% possibility to click the ads links, and 80.13% possibility not to click anything. The possibility for them to click the ads will be positively influenced by the test condition they have. The possibility will be positively influenced by their gender (as females) and be negatively influenced by their age, though the effect will be very slight.
# 6 ads + male + younger 24
ols_click.predict([[1,0,1]])
click_6ads3 = explainer_ols.explain_instance(np.array([1,0,1]), ols_click.predict)
click_6ads3.show_in_notebook(show_table=True)
Male users younger than 24 years old in the six ads condition have about 19.66% possibility to click the ads links, and 80.34% possibility not to click anything. The possibility for them to click the ads will be positively influenced by the test condition they have. The possibility will be negatively influenced by their gender (as males) and be positively influenced by their age (< 24 years old) very slightly.
# 6 ads + female + younger 24
ols_click.predict([[1,1,1]])
click_6ads4 = explainer_ols.explain_instance(np.array([1,1,1]), ols_click.predict)
click_6ads4.show_in_notebook(show_table=True)
Female users younger than 24 years old in the six ads condition have about 20% possibility to click the ads links, and 80% possibility not to click anything. The possibility for them to click the ads will be positively influenced by the test condition they have. Their age and gender will positively affect the possibility, but the influence is not significant. Among all the situations, this group of users will have the largest possibility to click an ad.
# predict mean skip frequency
ols_skip = LinearRegression(fit_intercept = True)
ols_skip.fit(train[['six_ads', 'is_female', 'younger_24']], train['mean_skip'])
lime_skip = train[['six_ads', 'is_female', 'younger_24', 'mean_skip']]
class_names_skip = lime_skip.columns
X_lime_skip = lime_skip[['six_ads', 'is_female', 'younger_24']].to_numpy()
y_lime_skip = lime_skip['mean_skip'].to_numpy()
explainer_ols = lime.lime_tabular.LimeTabularExplainer(
X_lime_skip,
feature_names = class_names_skip,
class_names = ['skip'],
verbose = True,
mode = 'regression',
discretize_continuous=True)
# 3 ads + male + older 24
ols_skip.predict([[0,0,0]])
skip_3ads1 = explainer_ols.explain_instance(np.array([0,0,0]), ols_skip.predict)
skip_3ads1.show_in_notebook(show_table=True)
Male users who are older than 24 years old and watch three ads in a row will have about 3.45% possibility to skip an ad (watch less than three seconds) and have a 96.55% possibility of watching an ad for more than three seconds. Watching three instead of six advertisements will increase the possibility of them jumping over an ad, while their gender and age will have a minor effect on this relationship. Among all the user demographics and test condition combinations, this group of users has the most considerable possibility to skip an ad, which means that the ads’ effectiveness on the skip frequency side will be the smallest to these groups of users.
# 3 ads + female + older 24
ols_skip.predict([[0,1,0]])
skip_3ads2 = explainer_ols.explain_instance(np.array([0,1,0]), ols_skip.predict)
skip_3ads2.show_in_notebook(show_table=True)
Female users older than 24 years old and who watch three ads in a row will have about 3.36% possibility to skip an ad (watch less than three seconds) and have 96.64% possibility to watch an ad for more than three seconds. Watching three instead of six advertisements will increase the possibility for them to jump over an ad. Their gender (as females) and age (older than 24 years old) decrease and increase the influence insignificantly.
# 3 ads + male + younger 24
ols_skip.predict([[0,0,1]])
skip_3ads3 = explainer_ols.explain_instance(np.array([0,0,1]), ols_skip.predict)
skip_3ads3.show_in_notebook(show_table=True)
Male users younger than 24 years old and who watch three ads in a row will have about 3.43% possibility of skipping an ad (watch less than three seconds) and have 96.57% possibility to watch an ad for more than three seconds. Watching three instead of six advertisements will increase the possibility for them to jump over an ad. Being male will increase this tendency, while at the same time, the younger age will decrease the jump possibility. Neither the effect of gender nor that of age influences the relationship largely.
# 3 ads + female + younger 24
ols_skip.predict([[0,1,1]])
skip_3ads4 = explainer_ols.explain_instance(np.array([0,1,1]), ols_skip.predict)
skip_3ads4.show_in_notebook(show_table=True)
Female users younger than 24 years old and who watch three ads in a row will have about 3.34% possibility to skip an ad (watch less than three seconds) and have 96.66% possibility to watch an ad for more than three seconds. Watching three instead of six advertisements will increase the possibility for them to jump over an ad. The age and gender of this group of users all decrease the tendency insignificantly.
# 6 ads + male + older 24
ols_skip.predict([[1,0,0]])
skip_6ads1 = explainer_ols.explain_instance(np.array([1,0,0]), ols_skip.predict)
skip_6ads1.show_in_notebook(show_table=True)
Male users who are older than 24 years old and watch six ads in a row will have about 1.05% possibility of skipping an ad (watch less than three seconds) and have 98.95% possibility to watch an ad for more than three seconds. Watching six instead of three advertisements will decrease the possibility of jumping over an ad, while their gender and age will have a minimal effect on this relationship.
# 6 ads + female + older 24
ols_skip.predict([[1,1,0]])
skip_6ads2 = explainer_ols.explain_instance(np.array([1,1,0]), ols_skip.predict)
skip_6ads2.show_in_notebook(show_table=True)
Female users who are older than 24 years old and watch six ads in a row will have about 0.96% possibility to skip an ad (watch less than three seconds) and have a 99.04% possibility of watching an ad for more than three seconds. Watching six instead of three advertisements will decrease the possibility for them to jump over an ad. Their gender (as females) and age (older than 24 years old) decrease and increase the influence insignificantly.
# 6 ads + male + younger 24
ols_skip.predict([[1,0,1]])
skip_6ads3 = explainer_ols.explain_instance(np.array([1,0,1]), ols_skip.predict)
skip_6ads3.show_in_notebook(show_table=True)
Male users younger than 24 years old and who watch six ads in a row will have about 1.03% possibility to skip an ad (watch less than three seconds) and have 98.97% possibility to watch an ad for more than three seconds. Watching six instead of three advertisements will decrease the possibility for them to jump over an ad. Being male will increase this tendency, while at the same time, the younger age will decrease the jump possibility. Neither the effect of gender nor that of age influences the relationship largely.
# 6 ads + female + younger 24
ols_skip.predict([[1,1,1]])
skip_6ads4 = explainer_ols.explain_instance(np.array([1,1,1]), ols_skip.predict)
skip_6ads4.show_in_notebook(show_table=True)
Female users younger than 24 years old and who watch six ads in a row will have about 0.94% possibility to skip an ad (watch less than three seconds) and have 99.06% possibility to watch an ad for more than three seconds. Watching six instead of three advertisements will decrease the possibility for them to jump over an ad. The age and gender of this group of users all decrease the tendency insignificantly. Among all the user demographics and test condition combinations, this group of users has the tiniest possibility to skip an ad, which means that the ads’ effectiveness on the skip frequency side will be most prominent to these groups of users.
# predict mean watch time
ols_watch = LinearRegression(fit_intercept = True)
ols_watch.fit(train[['six_ads', 'is_female', 'younger_24']], train['mean_watch'])
lime_watch = train[['six_ads', 'is_female', 'younger_24', 'mean_watch']]
class_names_watch = lime_watch.columns
X_lime_watch = lime_watch[['six_ads', 'is_female', 'younger_24']].to_numpy()
y_lime_watch = lime_watch['mean_watch'].to_numpy()
explainer_ols = lime.lime_tabular.LimeTabularExplainer(
X_lime_watch,
feature_names = class_names_watch,
class_names = ['watch'],
verbose = True,
mode = 'regression',
discretize_continuous=True)
# 3 ads + male + older 24
ols_watch.predict([[0,0,0]])
watch_3ads1 = explainer_ols.explain_instance(np.array([0,0,0]), ols_watch.predict)
watch_3ads1.show_in_notebook(show_table=True)
If a male user older than 24 years old watches three ads continuously, he will, on average, stay on each ad for about 8.62 seconds. This time will be decreased by the number of ads he will be pushed, and being a younger male decreases the time insignificantly. This group of users will have, on average, the shortest watch time for an advertisement.
# 3 ads + female + older 24
ols_watch.predict([[0,1,0]])
watch_3ads2 = explainer_ols.explain_instance(np.array([0,1,0]), ols_watch.predict)
watch_3ads2.show_in_notebook(show_table=True)
If a female user older than 24 years old watches three ads continuously, she will, on average, stay on each ad for about 8.65 seconds. This time will be decreased by the number of ads (three ads) she will be pushed. Being a female increase the time but being old decreases the time. These two influences will not be significant.
# 3 ads + male + younger 24
ols_watch.predict([[0,0,1]])
watch_3ads2 = explainer_ols.explain_instance(np.array([0,0,1]), ols_watch.predict)
watch_3ads2.show_in_notebook(show_table=True)
If a male user younger than 24 years old watches three ads continuously, he will, on average, stay on each ad for about 8.63 seconds. This time will be decreased by the number of ads (three ads) he will be pushed. As a male user, he will stay slightly shorter than a female user, and at the same time, being younger increases the time slightly.
# 3 ads + female + younger 24
ols_watch.predict([[0,1,1]])
watch_3ads2 = explainer_ols.explain_instance(np.array([0,1,1]), ols_watch.predict)
watch_3ads2.show_in_notebook(show_table=True)
If a female user younger than 24 years old watches three ads continuously, she will, on average, stay on each ad for about 8.67 seconds. This time will be decreased by the number of ads (three ads) she will be pushed while being younger female increases the time insignificantly.
# 6 ads + male + older 24
ols_watch.predict([[1,0,0]])
watch_6ads1 = explainer_ols.explain_instance(np.array([1,0,0]), ols_watch.predict)
watch_6ads1.show_in_notebook(show_table=True)
If a male user older than 24 years old watches six ads continuously, he will, on average, stay on each ad for about 9.48 seconds. This time will be increased by the number of ads he will be pushed, and being a younger male decreases the time insignificantly.
# 6 ads + female + older 24
ols_watch.predict([[1,1,0]])
watch_6ads1 = explainer_ols.explain_instance(np.array([1,1,0]), ols_watch.predict)
watch_6ads1.show_in_notebook(show_table=True)
If a female user older than 24 years old watches six ads continuously, she will, on average, stay on each ad for about 9.51 seconds. This time will be increased by the number of ads (six ads) she will be pushed. Being a female increase the time but being old decreases the time. These two influences will not be large.
# 6 ads + male + younger 24
ols_watch.predict([[1,0,1]])
watch_6ads1 = explainer_ols.explain_instance(np.array([1,0,1]), ols_watch.predict)
watch_6ads1.show_in_notebook(show_table=True)
If a male user younger than 24 years old watches six ads continuously, he will, on average, stay on each ad for about 9.49 seconds. This time will be increased by the number of ads (six ads) he will be pushed. As a male user, he will stay slightlyshorter than a female user, and at the same time, being younger increases the time slightly.
# 6 ads + female + younger 24
ols_watch.predict([[1,1,1]])
watch_6ads1 = explainer_ols.explain_instance(np.array([1,1,1]), ols_watch.predict)
watch_6ads1.show_in_notebook(show_table=True)
If a female user younger than 24 years old watches six ads continuously, she will, on average, stay on each ad for about 9.52 seconds. This time will be increased by the number of ads (six ads) she will be pushed while being younger female increases the time insignificantly. This group of users will have, on average, the longest watch time for an advertisement.
Main effect
Generally speaking, the number of advertisements that will be pushed to the users will significantly change the effectiveness of the ads: comparing with pushing three ads, pushing six ads will increase users’ tendency to click the ads, decrease the possibility for them to jump over the ads, and increase their watch time for each ad.
Moderation effect
Theoretically, being a female and being younger will increase the effectiveness of the ads. A younger female can have a higher possibility of clicking the links, less likely to skip an ad, and watch an ad for a longer time compared with other groups of users. However, the differences between users of different ages and genders will be so slight that they can be hard to detect.
Note: this section is based on the premise that the dataset used is not simulated.
To efficiently evaluate the models, more models with more or fewer moderators are needed to compare with the current ones. However, both the regressions and machine learning predictions show that the predicted moderators (age and sex) cannot influence the result significantly, which means that adding or deleting moderators will not cause much difference in the model.
Here, the confusion matrices are just used to calculate the f1 scores and to make brief interpretations.
# click frequency
logit_users = LogisticRegression(max_iter=1000, fit_intercept = True)
logit_users.fit(train[['six_ads', 'is_female', 'younger_24']], train['click_zero'])
test['predict_click_zero'] = logit_users.predict(test[['six_ads', 'is_female', 'younger_24']])
test['predict_click_zero'].value_counts()
In the test dataset, 2,510 users are predicted to click the ads, while the other half of the users are predicted not to click the ads.
print(confusion_matrix(test['click_zero'], test['predict_click_zero']))
print(classification_report(test['click_zero'], test['predict_click_zero']))
The report shows that the model can predict about 61% of cases right (precision score), and among all the users that will click the ads, the model can find about 70% of them (recall score). The f1 score for this model is 0.67.
# skip frequency
logit_users.fit(train[['six_ads', 'is_female', 'younger_24']], train['skip_zero'])
test['predict_skip_zero'] = logit_users.predict(test[['six_ads', 'is_female', 'younger_24']])
test['predict_skip_zero'].value_counts()
In the test dataset, all 5018 users are predicted not to skip the ads.
print(confusion_matrix(test['skip_zero'], test['predict_skip_zero']))
print(classification_report(test['skip_zero'], test['predict_skip_zero']))
The report shows that the model can predict about 92% of cases right (precision score), and among all the users that will skip the ads in reality, the model can find every one of them (recall score). The f1 score for this model is 0.92.
#watch time
logit_users.fit(train[['six_ads', 'is_female', 'younger_24']], train['watch_nine'])
test['predict_watch_nine'] = logit_users.predict(test[['six_ads', 'is_female', 'younger_24']])
test['predict_watch_nine'].value_counts()
In the test dataset, 2,510 users are predicted to watch an ad for over nine seconds, while the other half of the users are predicted watch it for less than nine seconds.
print(confusion_matrix(test['watch_nine'], test['predict_watch_nine']))
print(classification_report(test['watch_nine'], test['predict_watch_nine']))
The report shows that the model can predict about 62% of cases right (precision score), and among all the users that will watch the ads for over 9 seconds, the model can find about 61% of them (recall score). The f1 score for this model is 0.61.
In this study, the A/B test conducted and the data collected are used to explore the relationship between the number of advertisements watched by the Douyin users continuously and the effectiveness of the ads. In general, hypothesis 1 is rejected, and given the current data, hypotheses 2 and 3 cannot be confirmed.
| Hypothesis | Test result |
|---|---|
| H1 | Rejected |
| H1a | Rejected |
| H1b | Rejected |
| H1c | Rejected |
| H2 | Not confirmed |
| H2a | Not confirmed |
| H2b | Not confirmed |
| H2c | Not confirmed |
| H3 | Not confirmed |
| H3a | Not confirmed |
| H3b | Not confirmed |
| H3c | Not confirmed |
From the data and analysis, a conclusion can be drawn that the number of ads watched by the users can very much influence their effectiveness, namely:
Overall, with the number of advertisements increase, the effectiveness of the ads will also increase.
The analysis also found that gender and sex, as the two possible covariates of the relationship, influence the effectiveness of the ads slightly, which means that theoretically:
However, these two influences were tested as insignificant by the statistic analysis process. Thus the effect of gender and age in this relationship can be, in fact, difficult to detect.
For the summarized visualization, the following aims should be reached:
For the three DVs, the higher ads effectiveness are shown by:
mean_click,mean_skip, andmean_watch. Therefore, the basic rules of combining these three DVs can be
mean_click - mean_skip + mean_watch.
However, the mean differences of these three DVs are too large. On average, the users spent 9 seconds on one ad, while their mean possibility to skip one ad is only 1.79%. The large mean differences of the DVs indicate that if the DVs are only summed up, the difference in watch time between users will influence the overall effect more than skip frequency. Therefore, this equation is formed to create effectiveness:
effectiveness = [mean_click - mean(mean_click)] - [mean_skip - mean(mean_skip)] + [mean_watch - mean(mean_watch)]
# create the overall score
users['effectiveness'] = (users['mean_click'] - 0.17) + (users['mean_watch'] - 9.08) - (users['mean_skip'] - 0.02)
# check the distribution of the overall score
sns.distplot(x = users['effectiveness'])
# change the name of sex so the legends of the summarized visualization will be clearer
def generate_gender1(row):
if row['sex'] == 1:
row['sex'] = 'Female'
if row['sex'] == 0:
row['sex'] = 'Male'
return row
users = users.apply(generate_gender1, axis=1)
# create summarized visualization
facet_summ = sns.FacetGrid(users, col = 'younger_24')
facet_summ.map_dataframe(sns.lineplot, x = 'condition', y = 'effectiveness', hue = "sex", err_style = None, legend = "full")
facet_summ.set_axis_labels('Condition (Number of Ads)', 'Ads Effectiveness')
facet_summ.set(xlim=(3, 6), xticks=[3,6])
axes_facet = facet_summ.axes.flatten()
axes_facet[0].set_title('Older than 24 years old')
axes_facet[1].set_title('Younger than 24 years old')
facet_summ.add_legend()
# sum up columns to calculate CTR
users.sum(axis=0)
CTR is calculated by the click frequency divided by the overall view. Here, instead of calculating the CTR for an individual ad, the overall CTR is calculated. So the sum-up value is used of click_freq to divide the sum up the value of the condition (how many times ads have been watched in the test).
$CTR = sum($ click_freq $) / sum($ condition $) = 20494 / 112422 = 18.23$%
Although the result of the confusion matrices is not satisfactory enough, the results of the analysis can provide valuable suggestions to Douyin. The positively related link between the number of ads and their effectiveness reveals that Douyin ads are not repugnant to the company’s users, and the users have a higher tendency to be attracted when they watch more ads continuously. This fact provides room for Douyin to push more advertisements to its users. However, a note should be taken that although the relationship is found to be positive in the test, the users may feel impatient if the platform pushes too many ads (e.g., ten ads continuously) to them. Therefore, the safest option before the next test is that Douyin offers its users (of all age and gender groups) six in-feed ads in a row to reach the most significant effect of the advertisements.
Apart from the result of hypotheses testing, interesting facts of Douyin in-feed ads that can influence the company’s earning power were also found in the process. The average CTR in the test is 18.23%, which is way higher than that of other social media platforms in 2020 (Statista, 2019). Not only do the ads have a relatively higher CTR on Douyin, but their mean watch time is also long (about 9 seconds). The CTR data and mean watch time indicate that the video ads on Douyin, compared with ads on other platforms, are more likely to be clicked by the users. The average watch time is far enough for the users to learn the general features of the products. The data can be presented when the company tries to attract new advertisers in the future to increase its advertisement revenue.
Note: this section is based on the premise that the dataset used is not simulated.
Although yielding a significant effect for the main relationship, there are still limitations for the test.
Firstly, no moderation effect was found in the test dataset. The lack of the moderation effect, combining with the fact that the regression models used to test the hypotheses all had low R2 values (which means that the model explained only small parts of the effect), may imply a possibility that wrong moderators were chosen for the test. Because users’ gender and sex were predicted as the moderators of the effect, only these two kinds of data were collected during the test. More data with different categories can be collected during the next test, and different kinds of moderation effects can be explored.
Secondly, the regression model type chosen in the analysis was not the most suitable one. As mentioned above, this test included three DVs: the mean click frequency, the mean skip frequency, and the mean watch time. The analysis for watch time of the ads (linear regression) was not problematic as it is a numerical variable. However, the other two average values, namely the average click and skip frequency of an ad, though are mean scores, were calculated from categorical variables. The descriptive statistics of the two variables also show that they have limited categories. One solution should be to use the dummies for these three DVs and apply logistic regression models. However, logistic regression was not chosen because only use two-categories variables would lose much information. The best solution is that these two variables should have been treated as categorical variables. Different types of regression that can have categorical DVs should have been used to get more precise results.
Thirdly, the confusion matrices show low F1 scores for the models. The low values indicate that the models cannot reach high precision and recall values, which means that the models created are not fit enough. After the test, the team that conducted the test should retrain the model by following these steps.
After the test and the analysis come to an end, the data of the test should be dealt with cautiously. The company should either reserve the data in an encrypted server, delete it after the period stated on the informed consent, or delete it immediately after the test report is finished. Moreover, a detailed plan for the next A/B test can be formed, and the consent form that will be used in the next test should also be written and updated based on the current version.
The company indeed can conduct more A/B tests to explore the effectiveness of the advertisements. As the aforementioned results indicate that the relationship between the number of ads and their effectiveness is positive, it is necessary for the company to further explore the possible ceiling of this positive relationship. Future tests can be conducted with more ads. For example, the next A/B test can be used to compare the effectiveness for users to watch six and ten ads in a row. Also, the pattern of pushing ads can be further studied. In this test, advertisements were pushed to the users continuously, which means that in the tests, the users began to see only advertisements after the fourth in-feed video appeared on their homepage. The next test can also study the difference of the effectiveness between the continuously pushed ads and the ads that are pushed once in a while.
A/B test is an area that requires much attention on ethical issues. L. Finn and Wadhwa (2014) introduced the possible invasions of users’ privacy during the personalized advertising process. Linked with the test conducted by Douyin recently, users’ data in the process was treated controversially.
Firstly, although no private posts and videos were acquired during the test, the users and their preferences of ads can also be identified through the combination of IP address, gender, and age collected by the company. The data can be used to target them. Also, the test treated users as commodities. Although the users do not need to pay for the usage of the app, their data is used to make money. As mentioned above, with no proper informed consent signed or exhibited to users, there was no possibility that the users who participated in the test could discover their situation and thus reject it.
Moreover, pushing personalized ads to users may lead to the change of their behaviors. If Douyin increases the number of ads shown to their users, they can be easier manipulated. In general, the consequences of the test may include the identification, objectification, exploitation, and manipulation of the users.
Although the users will not be picked according to the test result, and the test was oversighted by staff, the process still has the possibility of harming the users and has low explicability. It has not fully reached the ethical principles marked by Ethics Guidelines for Trustworthy AI (High-level Expert Group on Artificial Intelligence, 2019). An unintended consequence can be data leakage. There were multiple examples of social media data leakage that happened in recent years. A report from CPO magazine (Ikeda, 2020), a platform that focuses on privacy protection, reveals the increasing number of data breach cases happened due to carelessness. A class can be learned that Douyin needs to reserve the data carefully and keep user data anonymous during the data analysis process. Meanwhile, to have a higher level of explicability, the company needs to have clearer user consents before conducting the next A/B test and staying transparent about its purposes.
A/B test can raise concerns because of lacking informed consent, including intentional deception, and lacking the protection for human subjects (Benbunan-Fich, 2016). The situation requires more surveillance from society. A good example here should be the General Data Protection Regulation (GDPR) which proposed principles of processing and preserving data ethically (European Parliament and Council of the European Union, 2016). Regulations that play the same role as GDPR should also be formed in China society to regulate Chinese companies, including Douyin.
The things that individual users should do include raising the awareness of data protection and realizing that they can choose freely and have the right to be treated by companies fairly and indiscriminately.
This report explains and concludes an A/B test, but it also takes the responsibility of reminding readers of the crucial role of ethical algorithms and ethical data processing.
Amos, C., Holmes, G., & Strutton, D. (2008). Exploring the relationship between celebrity endorser effects and advertising effectiveness. International Journal Of Advertising, 27(2), 209-234. doi: 10.1080/02650487.2008.11073052
Benbunan-Fich, R. (2016). The ethics of online research with unsuspecting users: From A/B testing to C/D experimentation. Research Ethics, 13(3-4), 200-218. doi: 10.1177/1747016116680664
Bleier, A., & Eisenbeiss, M. (2015). Personalized Online Advertising Effectiveness: The Interplay of What, When, and Where. Marketing Science, 34 (5), 669-688. doi: 10.1287/mksc.2015.0930
Celuch, K., Singley, R., & Slama, M. (2014). Gender and AD Effectiveness: The Role of Social Risk. Proceedings Of The 1994 Academy Of Marketing Science (AMS) Annual Conference, 377-381. doi: 10.1007/978-3-319-13162-7_103
European Parliament and Council of the European Union. (2016). General Data Protection Regulation (GDPR) – Official Legal Text. Retrieved 20 March 2021, from https://gdpr-info.eu/
Forbes. (2020). TikTok: Why The Enormous Success? . Retrieved from https://www.forbes.com/sites/tomtaulli/2020/01/31/tiktok-why-the-enormous-success/?sh=1acce88765d1
High-level Expert Group on Artificial Intelligence. (2019). Ethics Guidelines for Trustworthy AI. European Commission. Retrieved from https://www.aepd.es/sites/default/files/2019-12/ai-ethics-guidelines.pdf
Ikeda, S. (2020). Major Data Broker Exposes 235 Million Social Media Profiles in Data Leak. Retrieved 20 March 2021, from https://www.cpomagazine.com/cyber-security/major-data-broker-exposes-235-million-social-media-profiles-in-data-leak/#:~:text=The%20data%20leak%2C%20which%20contains,by%20security%20researchers%20with%20Comparitech.&text=This%20particular%20data%20leak%20contains,of%20Instagram%2C%20TikTok%20and%20YouTube.
L. Finn, R., & Wadhwa, K. (2014). The ethics of “smart” advertising and regulatory initiatives in the consumer intelligence industry. Info, 16 (3), 22-39. doi: 10.1108/info-12-2013-0059
Lucas, D., & Britt, S. (1963). Measuring advertising effectiveness. doi: 10.1037/13112-000
Maček, A., Bobek, V., Kubli, V., & Burböck, B. (2019). Effects of different types of framing in advertising messages on human decision behaviour. International Journal Of Diplomacy And Economy, 5 (1), 27. doi: 10.1504/ijdipe.2019.10020206
Martínez, J., & Mascarua, M. (2016). Exploring the Connections Between Age, Advertising Effectiveness and Purchase Intention in a Traditional Media Setting (Magazines) Versus a Digital Media Setting (Digital Magazines) (in México). SSRN Electronic Journal. doi: 10.2139/ssrn.3437106
Nasiri, S., Sammaknejad, N., & Sabetghadam, M. (2020). The effect of human face and gaze direction in advertising. International Journal Of Business Forecasting And Marketing Intelligence, 6 (3), 221. doi: 10.1504/ijbfmi.2020.111373
Petty, R., & Cacioppo, J. (1986). The Elaboration Likelihood Model of Persuasion. Advances In Experimental Social Psychology, 123-205. doi: 10.1016/s0065-2601(08)60214-2
Phillips, D., & Stanton, J. (2004). Age-related differences in advertising: Recall and persuasion. Journal Of Targeting, Measurement And Analysis For Marketing, 13 (1), 7-20. doi: 10.1057/palgrave.jt.5740128
Reuters. (2020). Exclusive: TikTok-owner ByteDance to rake in $27 billion ad revenue by year-end: sources. Retrieved from https://www.reuters.com/article/china-bytedance-revenue-idUSKBN27R191
Segijn, C., & Eisend, M. (2019). A Meta-Analysis into Multiscreening and Advertising Effectiveness: Direct Effects, Moderators, and Underlying Mechanisms. Journal Of Advertising, 48 (3), 313-332. doi: 10.1080/00913367.2019.1604009
Statista. (2019). Global social video ads CTR by platform and age 2019 | Statista. [online] Available at: https://www.statista.com/statistics/1048619/social-video-ads-clickthrough-rate-platform-age-worldwide/ [Accessed 13 March 2021].
Stewart, K., Kammer-Kerwick, M., Auchter, A., Koh, H., Dunn, M., & Cunningham, I. (2019). Examining digital video advertising (DVA) effectiveness. European Journal Of Marketing, 53 (11), 2451-2479. doi: 10.1108/ejm-11-2016-0619
Tankovska, H. (2021). Most popular social networks worldwide as of January 2021, ranked by number of active users. Retrieved 12 March 2021, from https://www.statista.com/statistics/272014/global-social-networks-ranked-by-number-of-users/
The Infinite Agency. (2021). Tik Tok Advertising: How Brands Are Using Tik Tok | The Infinite Agency. [online] The Infinite Agency. Available at: https://theinfiniteagency.com/insights/social/tapping-into-tiktok-as-a-branding-platform/#:~:text=TikTok%20has%20multiple%20types%20of,12%25%20CTR%20to%20site).&text=TikTok%20is%20currently%20working%20on,geo%2C%20age%20bracket%2C%20and%20interest [Accessed 13 March 2021].
The New York Times. (2019). How TikTok Is Rewriting the World. Retrieved from https://www.nytimes.com/2019/03/10/style/what-is-tik-tok.html